ED 054 832 
TITLE 

INSTITUTION 



DOCUMENT RESUME 



SPONS AGENCY 
PUB DATE 
NOTE 



LI 003 108 

Research 1970/1971: Annual Progress Report. 

Georgia Inst, of Tech., Atlanta. Science Information 
Research Center. 

National Science Foundation, Washington, D.C. 

71 

1 27 p. ; (13 0 References) 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



MF— $0.65 HC— $6. 58 

Automation; Bibliographies; ^Computer Science; 
^Information Processing; ^Information Science; 
♦Information Systems; Information Theory; 
Linguistics; ♦‘Research; Semiotics 
♦Systems Design 



abstract 

The report presents a summary of science information 
research activities of the School of Information and Computer 
Science, Georgia Institute of Technology. Included are project 
reports on interrelated studies in science information, information 
processing and systems design, automata and systems theories, and 
semiotics and linguistics. Also presented in the report is a 
description of the programs of the School of Information and Computer 
Science, and a summary of research activities at the 

Information/Computer Science Laboratory. The report concludes with a 
bibliography of publications for the period 1970/71. (Author) 




Work reported in this publication was performed at the School of Information and Com- 
puter Science, Georgia Institute of Technology. The primary support of this work came from the 
Georgia Institute of Technology and from agencies acknowledged in the report. 

Unless restricted by a copyright notice, reproduction and distribution of reports on research 
supported by Federal funds is permitted for any purpose of the United States Government. Copies of 
such reports may be obtained from the-National Technical Information Service, 5285 ' Port Royal 
Road, Springfield, Virginia 22151. 



The graduate School of Information and Computer Science of the Georgia Institute of 
Technology offers comprehensive programs of education, research and service in the information, 
computer and systems sciences. As part of its research activities the School operates, under a grant 
from the National Science Foundation, an interdisciplinary science information research center. Cor- 
respondence concerning the programs and activities of tb" r ,e addressed to Director, 

School of Information and Computer Science, Georgia uwutu te of iccnnology, Atlanta, Georgia 
30332. Telephone: (404) 873-4211. 



3 

ERIC 



1 



ED054832 



GITIS-71-03 
(Research Report) 



■ w * ntMi -in, 

EDUCATION & WELFARE 
office OF EDUCATION 

this document has been repro- 

twp D? EXACTLY AS RECEIVED FROM 

LatmJ S0N or organization orig- 

^I |NG ,T POINTS OF VIEW OR OPIN- 

stated do not necessarily 
represent official office of edu- 
cation POSITION OR POLICY 



RESEARCH 1970/1971: 
ANNUAL PROGRESS REPORT 



By the 




Science Information Research Center 




S chool-of^jnformatioyand^C omputen -SeienG ey 1971 

GEORGIA INST ITUT E OF TECH|f0irerGT 

Atlanta, Georgia 




« 

SO 

o 

o ' 

o 

ERIC 



2 





ACKNOWLEDGEMENT 



The work reported in the paper has been sponsored in part by 
the National Science Foundation Grant^ GN -655 . This assistance 
is gratefully acknowledged. ^ p ^ (y _ £ S r | 0 1 b 

VLADIMIR SLAMECKA 
Project Director 





ABSTRACT 



The report presents a summary of science information 
research activities of the School of Information and Computer 
Science, Georgia Institute of Technology. Included are pro- 
ject reports on interrelated studies in science information, 
information processing and systems design, automata and sys- 
tems theories , and semiotics and linguistics . Also presented 
in the report is a description of the programs of the School 
of Information and Computer Science, and a summary' of research 
activities at the Information/ Computer Science Laboratory. 

The report concludes with a bibliography of publications for 
the period 1970/71. 



CONTENTS 



PREFACE 1 

THE SCHOOL OF INFORMATION AND COMPUTER SCIENCE 3 



STUDIES IN SCIENCI INFORMATION 9 

Structural Analysis of National Science Information Systems 
Predictive Models of Scientific Progress 12 

Extended Effects of Information Processes and Processors 
Automatic Classification of Indexed Monographs 18 

On Scientific-Technical Tape Information Services 21 
Automated Structuring of Natural Language Text 24 

Study of Concept- Based Grammars 36 

Toward a Theory of Mechanical Problem Analysis 43 



10 



15 



STUDIES IN INFORMATION PROCESSING AND SYSTEMS DESIGN 47 

Interactive Preparation of State-of-the-Art Reviews . 48 
Exte ndin g the Utility of Science Information to Education 52 

An Adaptive Spectrum Analysis Vocoder 55 
Computer Picture Processing 57 

Computer Structure for Description of Pictures . 59 

Optimal Simultaneous Flow in Single Path Communications Networks 
Pre-scheduler and Management Model for Computer-User Systems 65 

Multiprogramming Scheduling 67 

Problems in Operating Systems Design '69 



STUDIES IN AUTOMATA AND SYSTEM THEORIES 71 



The Algebraic Theory of Abstract Computers _ 72 

Abstract Computers and Degrees of Unsol vabilitv 75 

Abstract Computers and Automata 76 

Combiff ton io ooilf .erS and Their 1 Algebras _ 77 

The Algebras of Programming Languages 78 

On Chomsky's Context-Sensitive Language Which Is Not Context-Free 
Research on Interaction Within Systems 82 
A Theory of Dynamic Group Behavior 84 

STUDIES IN SEMIOTICS AND LINGUISTICS 89 

Information Measurement and Value 90 

Utilization of Semantic Information Measures CSIM) 94 

Taxonomy of Linguistic Structural Theories 96 

Studies of Natural Language 98 
Development, of an Ostensive Calculus 109 

THE INFORMATION AND COMPUTER SCIENCE LABORATORY 119 

BIBLIOGRAPHY OF PUBLICATIONS, 1970/1971 121 




vii 

5 



PREFACE 



The present report is the third annual account of major 
activities of the Science Information Research Center at the 
School of Information and Computer Science, Georgia Institute 
of Technology. The report covers the 12-month period extend- 
ing from July 1970 to June 1971.* 

The Georgia Tech Science Information Research Center was 
established in 1967, under partial sponsorship of the National 
Science Foundation. The principal objective of the Center is 
to contribute , through research , to the body ox knowledge in 
information science, and to the utilization of scientific in- 
formation. An important second function of the Center is 
research training, accomplished through the participation of 
graduate students in the research activities of the Center. 

Since its establishment the Center has devoted a major 
part of its activities to two areas: studies in the scientific 
foundations of the discipline- of information science, partic- 
ularly in the theory of information and its processes ; and 
the development of techniques for a more effective and effi- 
cient exploitation of scientific information, both by and 
outside the scientific community. During the past year the 
Center began to formulate a modest program of studies v: Iwj. 
science information as an instrument for science policy 
formulation. National concern with scientific knowledge has 
recently extended to issues of prudent policies for the manage- 
ment of knowledge as a national and international resource; 
central to these issues is not only the utilization of this 
resource but also its production, value, and effect on both 
science and other enterprises of society. In the years to 
come, the Georgia Tech Science Information Research Center 
proposes to emphasize research relevant to these problems . 

The intent of this annual research report is to provide 
capsule summaries of research activities and results , since 
more formal publications in journals invariably incur delays. 
Included in this report are also a brief review of the present 
status of the academic programs of the School of Information 
and Computer Science, and a bibliography of issued or accepted 
publications of the School. 



"For previous reports see Cl) Slamecka, V. The Georgia Tech Science 
Information Center: A Summary Report 3 1967/1969, Atlanta, Ga. , 

Georgia Institute of Technology (School of Information and Computer 
Science), 1969. Research Report, GITIS-69-18. 30 p.; and (2) Georgia 
Institute of Technology, Science Information Research Center. Research 
1969/1970: Annual Progress Report , Atlanta, Ga. , Georgia Institute 
of Technology (School of Information and Computer Science) , 1970. 
Research Report GITIS-70-0 9. 84 p. 



The annual report highlights the main research activities 
of the Science Information Research Center; it does not comprise 
fhe full record. Particularly lacking in it is an indication of 
the profound impact which the existence of the Center has had on 
the student body of the School of Information and Computer Science, 
on the Georgia Institute of Technology, and on the research and 
education communities of the State of Georgia . It is also in 
consideration of this effect that the faculty and students. of 
the School of Information and Computer Science express , a sincere 
acknowledgement to the sponsors of the Center: the Office of 
Science Information Service of the National Science Foundation, 
and the University System of Georgia, through the Georgia 
Institute of Technology. 



Atlanta, Ga. 
August, 1971 



VLADIMIR SLAMECKA 
Director 



THE SCHOOL OF INFORMATION AND COMPUTER SCIENCE 



The establishment of academic and research programs in the 
information, computer and systems sciences at the Georgia Institute 
of Technology has been a direct response to the concerns , voiced by 
both the Congress of the United States and the scientific community, 
with the management and utilization of man 1 s knowledge. The response 
was the joint effort of the National Science Foundation, charged by 
Congress to attend to these concerns , and the Institute - 

In 1961, under the sponsorship of the National Science Founda- 
tion, the Institute appointed a committee to study and develop long- 
range approaches to education in information science. The results 
of the committee’s work were reported at two national conferences, 
conducted at the Atlanta campus of the Institute in October 1961 
and April 1962, to a national audience of scientists, engineers, 
information specialists , librarians , educators and administrators . 

Following the second conference, the Institute began to plan 
the establishment of degree programs in information science. Aca- 
demic programs leading to the degree of Master of Science in In- 
formation Science were endorsed in the early Fall of 1962 by both 
the Graduate Council of the Institute and the Board of Regents 
of the University System of Georgia, and formally opened in September 
1963 with a generous assistance by the Division of Education of the 
National Science Foundation. The doctoral program was inaugurated 
in 1968. In 1970, the name of the School and the designation of 
its degrees was changed to "Information and Computer Science." 

The structure and content of the educational programs of the 
School were given by two types of need: the need of society for 
individuals educated or trained to perform specific functions ; and 
the intrinsic need of a science to assure its own development and 
growth. The School perceived that its graduate programs must in- 
clude a strong component concerned with the development of a theo- 
retical base for the information-based professions; and it was 
equally obvious that they must not ignore the social mandate to 
supply professional personnel and effective methods capable of 
controlling and improving the functions of information management 
and transfer. Thus both theoretical and professional programs 
were indicated and implemented, forming a solid and logical founda- 
tion for continued development of the discipline, and justifying 
the societal role of the School. 

At the present time, seven years since its establishment, the 
School of Information and Computer Science is the largest graduate 
department of the Georgia Institute of Technology , in terms of 
students enrolled and graduated. 



3 



•it 




o 

ERIC 



The School currently offers extensive educational programs 
leading to the designated degrees of Doctor of Philosophy and Master 
of Science. The objective of the doctoral program is to prepare 
individ ual s for research or academic careers; the program is there- 
fore theoretically oriented, and much emphasis is placed on the ^ 
students’ early involvement in research.. ^The M.S. programs on tne 
other hand are applied in character, seeking to educate competent 
professionals for two types of career: information systems engineer- 
ing, and computer systems engineering. 

Several other academic programs are also offered by the School 
in- information/ computer science and/or engineering, including: an 
off-campus M.S. degree program; an evening M.S. degree program; an 
undesignated B.S. degree program; an Institute-wide "minor for uhe 
undergraduate division of Georgia Tech; and a curriculum in informa- 
tion/ computer science for high school teachers . Among new programs 
scheduled for establishment in 1972 are a full graduate-level program 
in Biomedical Information and Computer Science, to be offered jointly 
with the School of Medicine, Emory University; and an extensive, 
formal baccalaureate degree program in information and computer 
science . 

Table 1 summarizes the educational activities and programs of 
the School of Information and Computer Science since 1964. Tables 2 
and 3 offer a statistical overview of the development of the School 
from the viewpoints of student enrollment and faculty staffing. 
Finally, a list of courses offered by the School is shown in Table 4. 



References 

1. Slamecka, V. "Georgia Institute of Technology, School of 

Information and Computer Science". In: Kent, A., and 
Lancour, H. , (eds.) Encyclopedia of Library and Informa- 
tion Science. New York, Dekker, 1972. Volume IX (in 
printing . ) 

2. Georgia Institute of Technology. Graduate Catalogue and 

Announcements a 1971/1972.. Pp. 118-125. 

3 . Slamecka , V . and Zunde , P . "A University-Wide Program in the 

Ixif orirva-li ion Sciences • n PvocGed'tYiQs of the IFIP Wovld 
Conference on Computer Education 3 Amsterdam (Holland) , 
1970. Vol. II, PP- 229-234 



fable 1. Educational Programs of the School of Information and Computer Science, 

1964-1972 



Program 


Degree 


Year of 
Implementation 


Enrollment in 
Fall Quarter 1970 


[nformation Systems Engineering 


M.S. 


• 1964 


38 j 


Jomputer Systems Engineering 


M.S. 


1966 


43 


Jndergraduate Service Curriculum 


Non-degree 


1966 


619 ' 

i 


Information and Computer Science 


Fh.D. - 


1968 


13 


Dff-Campus Program 
at Lockheed-Georgia Co. 


M.S. 


1969 


16 


Jndergraduate Minor in Information 
and Computer Science 


Non-degree 

Cci ) 

Non-degree 


1970 


30 


Curriculum in Information Sciences 
for High School Teachers 


1970 


24 


Informat ion/ Computer Systems 
Engineering (evening program) 


M.S. 


1971 


30 (est. in 
1971) 


Graduate Program in Biomedical 
Communication 


M.S. Cb) 


1967 


30 (total 

graduates ) 


Graduate Program in Biomedical 
Information Science (c) 


M.S. 

Fh.D. 


1971/72 


20 (est. in 
1972) 


Degree Program in Undergraduate 


B.S. 


1972 


100 (est . in 


Information/ Computer Science 






1973) 



(a) Applicable to the M.A. degree in Education 

(b) Offered through a consortium of universities , administered by Tulane 
University; ceased in 1970 

(c) Jointly with Emory University, School of Medicine 



5 




Table 2. Student Enrollment, 1964-1972 



Fiscal Year 


Total Enrollment 


Total New 


Degrees Awarded 


1964/65 


30 


22 


1 


1965/66 


38 


15 


23 


1966/67 


65 


40 


12 


1967/68 


108 


46 


34 


1968/69 


137 


53 


51 


1969/70 


155 


'84 


40 


1970/71 


185 


102 


47 


1971/72* 


200 


110 


60 



*Estimated 



Table 3. Faculty/ Staff Development , 1964-1972 





Equivalent Full-Time** 


Fiscal Year 


Facult 
Professorial , 


y 

Instructors 


Pre-Doctoral 
S Lecturers 


Graduate 

Assistants 


Other 

Staff 


Total 


1964/65 


3.44 






1.55 


2.26 


7.25 


1965/66 


2.72 


- 


.68 


.39 


1.00 


4.79 


1966/67 


4.60 


.19 


.77 


2.84 


2.36 


10.76 


1967/68 


6.44 


1.64 


1.38 


4.70 


5.67 


19.83 


1968/69 


7.27 


1.23 


3.36 


3.29 


5.66 


20.81 


1969/70 


9.34 


1.75 


3.19 


4.55 


7.59 


26.42 j 


1970/71 


10.80 


1.39 


3.72 


3.90 


7.72 


27.53 


1971/72 (Est. ) 


13.19 


2.00 


3.40 


2.96 


8.00 


29.55 



*1.00 equals full-time for one fiscal year 

6 




Table 4 . Approved Courses Offered By School Of ICS 



Course 



S' . j 



ICS 110 
ICS 151 
ICS 215 
ICS 251 
ICS 256 
ICS 310 
ICS 325 
ICS 336 
ICS 342 
ICS 355 
ICS 401/402 
ICS 404 
ICS 406 
ICS 410 
ICS 415 
ICS 423 
ICS 424 
ICS 436 
ICS 445 
ICS 452 
ICS 458 
ICS 607 
ICS 608 
ICS 609 
ICS 612 
ICS 616 
ICS 621 
ICS 625 
ICS 626 
ICS 627 
ICS 628 
ICS 629 
ICS 632 
ICS 636/637 
ICS 638 
ICS 642 
ICS 645 
ICS: 646 
ICS 647 
ICS 652 
ICS 653 
ICS 656 
ICS 657 
ICS 658 
ICS 661 
ICS 673 



Title 

Information, Computers, Systems: An Orientation 

Digital Computer Organization and Programming 

Technical Information Resources 

Automatic Data Processing 

Computer and Programming Systems 

Computer- Oriented Numerical Methods 

Introduction to Cybernetics 

Introduction to Information Engineering 

Introduction to Semiotics 

Information Structures and Processes 

Languages for Science and Technology 

Topics in Linguistics 

Computing Languages 

Problem Solving 

The Literature of Science and Engineering 
Mathematical Techniques for Information Science 
Elements of Information Theory 
Information Systems 
Logistic Systems 

Logic Design and Switching Theory 
Computer Systems 

Communication and Control of Information 
Syntax of Natural languages 
Mathematical Linguistics 
Graph Theory 

Information Control Methods 
Theory of Communication 
Cybernetics 

Information Processes I 

Information Processes II 

Theory of Models 

Information Measures 

Equipment of Information Systems 

Information Systems Design I, II 

Problems in Systems Design 

Advanced Semiotics 

Advanced Logic 

Fnilosophy of Mind 

Artificial Intelligence 

Advanced Computer Organization _ 

Computer Techniques for Information Storage and Retrieval 

Computer Operating Systems 

Design of Computer Operating Systems 

Ev al uation of Computer Systems 

Computer Language Design 

Organization and Management of Information Industry 



9 

ERIC 



7 









Table 4 . (Cont 1 d. ) 



Course 


Title 


ICS 682 


System Theory I 


ICS 683 


System Theory II 


ICS 700 


Master 1 1 s Thesis 


ICS 701/702/703 


Seminar 


ICS 704/705/706 


Special Problems in Information Science 


ICS 704 


Combinatory Logic and the Calculi of Lambda— Conversion 




CSpecial Problems Course). 


ICS 706 


Pattern Recognition (Special Problems Course) 


ICS 706 


Management Information Systems Design (Special 




^Problems Course) 


ICS 710 


Philosophy of language 


ICS 726 


Theory of Automata 


ICS 736 


| Information Systems Optimization 


ICS 738 


j Advanced Systems Design 


ICS 761 ' 


Syntax-Directed .Compilation. 


ICS 799 


■ Ph.D. Dissertation Preparation 


ICS 800 


Doctor’s Thesis 

i 



STUDIES IN SCIENCE INFORMATION 



Structural A naly sis of National Science Information Systems 
Predictive Models of Scientific Trogrtss 12 

Extended Effects of Information Ifroceeses and Processor’s 
Automatic Classification of JSadssed Ltnograplis 18 

On Scientific— Technical Tape Information Services 21 

Automated Structuring of Natural Language Text 24 

Study of Concept— Based Grammars 36. 

Toward a Theory of Mechanical- Prsiblem ANalysis 43 



10 



15 




Structural A nalysis of National S cience Information Systems 

V. Slamecka , P. Zunde, D. H. Kraus 

Surveillance of the development of national science information 
systems in six countries with planned economies (.Bulgaria, Czecho- 
slovakia, Hungary, Poland, Romania, and Yugoslavia) was continued 
in 1970/1971 following the publication of a series of reports and 
papers [1-8] that concluded the first phase of the study. The. 
initial study had produced detailed descrip Live and, when possible, 
quantitative surveys of the development and apparent effectiveness 
of the national systems for scientific , technical , and economic 
information in the six countries. 

In 1970, the M.I.T. Press offered to publish an abridged. and 
updated version of the reports . It was felt * desirable to visit the 
major information centers in each of the countries of the study to 
procure first-hand information as well as literature generally not 
avai lab le in the United States. Partial support for such travel 
was provided by NSF Travel Grant GN-885, and the visits were made 
during 1970. Interviews were granted and literature furnished or 
promised in four of the six countries (Bulgaria, Hungary , Poland, 
Romania) and in Paris by a UNESCO expert on the information systems 
of Eastern Europe. Literature for updating and revising the reports 
has arrived since that time. Detailed information on the revisions 
of the Romanian information system was received, enabling the re- 
searchers to issue a revised version of the full report on Romania 
[9], In addition, a special study was made of the Bulgarian library 
and documentation system [10]. 

Work on the text of the book for the M.I.T. Press was begun 
early in 1971. Tentatively entitled A Gu-ide to Scientific, Tech- 
nical, and Economic Information and Documentation in Eastern Europe, 
the book will contain the following: a summary of the origins and 

evolution, current status , and developmental trends of information 
and docuntentation in the six countries of the study; a survey of. 
international cooperation in information transfer and di s s eminat ion 
in the CMEA countries (the aforementioned six, the Soviet Union, 
Mongolia, and Cuba) ; a report on the International Scientific and 
Technical Information Center in Moscow; and brief accounts, by 
country, of the development, current status, and special features, 
of scientific, technical, and economic documentation, and information 
systems in each country, followed by a directory of its information 
and documentation centers , and a list of the publications of these 
centers. A bibliography will be supplied for each chapter and a 
subject index will be prepared for the volume. 



References 



1. Kraus, D. H. Scientific and Technical Documentation ana. . 

Information in Bulgaria. Atlanta, Ga. , Georgia Institute ^ 
of Technology (School of Information and Computer Science ) , 
1968. Research Report. 79 p. (PB 179 981.) 

2. Kraus, D. H. Scientific* Technical* and Economic Documentation 

and Information in Czechoslovakia, Atlanta, ^Ga. , Georgia 
Institute of Technology (School of Information and Computer 
Science), 1968 . Research Report. 298 p. CPB 179 98Z,,) 

3. Kraus, D. H. Scientific* Technical * and Economic Documentation 

in Hungary. Atlanta,, Ga. , Georgia Institute of Technology 
(School of Information and Computer Science), 1968. Research 
Report. 144 p. (PB 179 983.) 

4. Kraus, D. H. Scientific and Technical Documentation and In- 

formation in Romania. Atlanta, Ga. , Georgia Institute of 
Technology (School of Information and Computer Science) , 
1968. Research Report, 91 p. CPB 179 985.) 

5. Kraus, D. H. Scientific and Technical Documentation and In- 

formation in Yugoslavia. Atlanta, Ga. , Georgia Institute 
of Technology (School of Information and Computer Science) , 
1968. Research Report. 120 p. (PB 179 986.) 

6. Slamecka, V. , Zunde, P. , and Kraus, D. H. "On the Structure 

of Six National Science Information Systems," International 
Forum of Informatics * VINITI, Moscow, 1969, pp. 318—334. 

7 Zunde, P. Scientific and Technical Documentation and Information 
in Roland. Atlanta, Ga. , Georgia Institute of Technology 
(School of Information and Computer Science), 1968. Re- 
search Report. 345 p. (PB 179 984.) 

8. Zunde, P. "Co-operation of the Information Agencies of the CMEA 
Countries," International Library Review* (1969), 1:513— 
535 . 

9 Kraus D- H. Scientific and Technical Documentation and In- _ 

formation in Romania* 1970. Atlanta, Ga, , Georgia Institute 
of Technology (School of Information and Computer Science), 
1970. Research Report, GlTlS-71-02 . 75 p. 

10. Kraus, D. H. "Bulgaria, Libraries in," Encyclopedia jcf Library 
and Information Science * New York, M. Dekker, 19^0. vol. 3, 
pp. 471-484 . 



Predicxi.fr~e -Models of Scientific Progress 

P. 2/urade, V. Slamecka 

Progress in science is essentially determined y the stimulating 
effects of information accumulation and transfer. cienti.f.ic commu- 
nication transmits and disseminates generated information :among in- 
formation generators, actual or potential, and stiisiidlates them to 
produce new information . It is themed: ore reasonat to postulate 
tliat the rate of development in a particular field- :f science de- 
pends on intensity of information generation din tha ~ field and on 
the intensity of stimulation or incitement boife f within that 
discipline and from without , by contributions from yother disciplines . 
In other words, the more scientific activity taking ■ place in a parti- 
cular scientific discipline , such as physics , and tfne more these 
activities are stimulated by scientific work in various ocher dis- 
ciplines , the greater we can expect the rate o£ the development of 
this discipline to be. We can assume that the iniemsi-ty of informa- 
tion generation in various disciplines as well as trie intensity of 
stimulation effect of generated information can be measured with 
some accuracy (and, indeed, have shown as part of this research 
project that such measurements are feasible at leaser under certain 
simplifying assumptions). 

It is further hypothesized that although the stimulating effect 
of certain information generators (sources) might b— of bong range, 
the immediate predecessors in the information generation process 
are on the average the most significant stimulants or incitors and 
that they represent in a sense the accumulated effects of the whole 
past history. This assumption is strongly supported by the evidence 
gained from research on literature citations , which shows that lit- 
erature cited in scientific publications is , for all practical pur- 
poses , limited to one or two decaides preceding the publication date 
of the citing document Cl]. 

Under these assumptions, the thrust of this research effort 
has been to model the process of science development as a Markov 
chain defined on the relative intensity of scientific productivity 
in a set of scientific disciplines as states , the state transition 
probabilities corresponding to the estimated degrees to which 
scientific production in one discipline stimulates scientific 
activity in another. Figuratively speaking, the process of science 
development is considered to be a result of a stimulating action 
of global stimulator's or "incitors” in the sense that the rate of 
the development in a scientific discipline at time t + At is a 
function of the relative intensity of the incitor action in that 
discipline at the time t and of the change of emphasis of the 
incitor action relative to other disciplines during the time At 
Clearly, this cumulative action consists of all the contributioi n 
of individual incitors which thus provide the impact for further 
scientific activity. 



Specifically, let us now consider -the whole field of human 
knowledge — - or some portion of it - - divided into a number of 
disciplines, such as physics, chemistry, mathematics, medicine, 
social science, etc. Let G. ={ g } be a set of stimulators in 

the £th scientific discipline and let D = D(D-^ jDg 5 be an 

ordered set of scientific disciplines . Associated with it is a 
set of nonnegative real numbers p. (£), £ = 1,2,..., n 3 such that 

n % 

0 ^ p ^C£) £ 1, y P^*) = 1* 

£ = 1 

These numbers are interpreted as the relative intensity of scientific 
"incitence" or stimulation in each of the scientific disciplines 
under consideration. 



Further let 0 ^ p . . £ 1 be another set of non -negative real 

numbers which represent the relative intensity of "incitence" which 
stimulators in the disciplines D. exert on the scientific activity 

in the discipline D.. 

Under the assumption that scientific progress can be modeled 
as a first-order Markov process, it is then completely defined by 
the initial distribution, which we shall write in vector form as 

p Co ) = JIp_^ ( o ) , P2 Co ) , . . • , p ^ Co ) J 

n 

P i C0) ^ 0 , P^CO) = 1 

£ = 1 



and by the matrix of transition probabilities 



P 11 


P 12 


• • • • 


P ln 


P 21 

• 


P 22 


• • • 


P 2n 


• - 

P wl 


P n2 


/ - a • • 


P nn 



such that 



53 

a - 1 



P. . S 0 

^3 



'i'gi 7 " — 1 , 2 , • • • » • Yl « 



13 



ERIC 



* v A; . £ . : 






Application of -this Markov chain model of science development: 
has been demonstrated on a sample of citation data provided in an 
article by Earle and Vickery [23. Our analysis was limited to the 
subset of data which gives the count of social science subjects 
only. The sample population consisted of 13,412 citations from 
897 books and periodicals, of which 256 books and 75 periodical 
-titles were classed as social science items. Analysis of the 
data in terms of the proposed model showed a clear tendency of 
shifting emphasis of scientific inquiry from science and technology 
to social science subjects [33. 

However, although the conclusions obtained from the model 
seem reasonable and intuitively founded, the validity of the 
nodel can be best verified by future data. 



References 

1. Cole, P. F. M Journal Usage Versus Age of Journal." 

J. Documentation 3 19:1 (1963). 

2. Earle, P., and Vickery, B. "Social Science Literature 

Use in the U.K. , as Indicated by Citations." J . Docvmen- 
tation 3 25:123 (1969). 

Zunde, P. , and Slamecka, V. "Predictive Models of Scientific 

Progress." To appear in Information Storage and Retrieval. 



3 . 



Extended Effects of Information Processes and Processors 



V. Slamecka, L. Chiaraviglio , W. T. Jones 

The purpose of this project has been to obtain theories of 
information processes and phenomena that will furnish ^ some clues 
as to the possible development of knowledge in an environment 
which contains an increasing number of artificial processors 
and other knowledge amplifying facilities . We are particularly 
interested in the pathologies that could be engendered by such 
amplification of the information processing capabilities of 
mankind. 



The project is conceived as having two phases. The first ^ 
phase, and the one in which we have been engaged up to this point, 
is concerned with studying the phenomena of information; the goal 
of this phase is to obtain phenomenal theories of the information 
processes. The second phase of the project has as a goal the ^ 
development of theories of the causal connections that underlie 
the observed processes. At present there exist both causal and 
phenomenal theories which are able to account for some selected 
aspects of the information processes. 



A variety of statistical models have been employed to achieve 
descriptions of some of the important aspects of the phenomena of 
information. Mathematical models originally developed for the 
purpose of accounting for epidemic, branching and neural net 
processes have been adapted to the description of some information 
processes Cl— 3j. These models are probably the most sophisticated 
theories that we have for dealing with seme of the phenomena of 
information spread and growth. 



It seems to us that many of the information processes of 
interest are analogous to the processes studied by geneticists 
and population biologists. The success obtained in adapting 
epidemic , branching, and neural net models to information processes 
is evidence that the analogies may be fundamental. Indeed, these 
models can be viewed as special cases of some of the models employed 
by biologists . V y v;v v . 



In the epidemic and neural net models , the factor transmitted 
and the connection made are taken to be of a single type . The 
population of individuals that results at any stage of these pro- 
cesses is classified as a function of this single type of factor. 



Branching models are able to account for any finite set of types 
of factors . At any stage of the process we can obtain for each 
individual in the population the probabilities that the individual 
will have received any one of the factors in the set. The three 
types of models seem to be special cases of the models used in 
genetics. Epidemic and neural net models correspond to genetic 






15 



20 






rrodels in which only one genetic factor is involved. Branching 
ixodels correspond to genetic models which involve a set of un- 
linked factors. None of these models takes into account either 
linkages among the factors or selective pressures that act 
differentially on them. 

We believe that it is worthwhile to try to adapt some of the 
concepts of genetics to information processes. Historical data 
is accumulating in the areas of the growth of science , the de- 
ployment of citations and other bibliomstrrc processes . As a 
first step in the adaptation, it seems reasonable to develop 
for this type of data the information analogue of the concept 
of phenotype. For example , in the case of citation data, analogues 
of genetic naps would seem to be intuitively meaningful and not 
too hard to construct. With such "information maps" it may be 
possible to discriminate populations which with respect to the 
deployment of citations over time act in strict analogy to bio- 
logical populations . This would enable us to give the concept 
of populations that share information (citations, in this example) 
a precise content. 

One of the difficulties of the study of information processes 
is the construction of experiments that will illuminate the his- 
torical data we have. It may be noted that there was a time 
when biology was largely composed of natural history and there 
were relatively few experimental systems that could be used to 
illuminate the observed processes . Information science is pres- 
ently in a similar situation . Genetic concepts allowed biology 
to go beyond natural history. A set of concepts must be found 
which will serve the same purpose in information science . 

The development of a theory of information processes may 
parallel the development of biology. Our first need is a phe- 
nomenal theory of the processes that will focus our attention on 
the phenomenal invariants . Biology took great strides towards 
genuine causal theories with the classical work, that associated 
physiological correlates to the phenomenal genetic markers. The 
recent advances in biochemical genetics have completed the main 
outlines of a causal theory of genetics. These advances were 
made possible by the highly developed phenomenal theories. 
Similarly, causal theories of information may have to wait for 
more highly developed phenomenal theories before significant 
bridges can be made to the "physiology" and "chemistry" of in- 
formation processors. 

There exists a body of knowledge concerned with the internal 
morphology and microprocesses that constitute many artificial and 
natural processors. In the case of artificial processors we have 
the obvious facility of accessing their "innards" to a degree that 
is still unachievable for most organisms. Yet there is not much 



-that can be said about how the constituent microprocesses and 
morphology of processors give rise to the observed information 
phenomena. It is our opinion that the lack resides on the side 
of the analysis of information phenomena. Thus as a first step 
we are aiming to construct theories that will give us more power- 
ful tools for the analysis of phenomena. 



References 

1. Goffman, W. , and Newill, V. A. ' ' Communication and Epidemic 

Processes." Proceedings of the Roy at Society, 298 : 316— 334 
(April 1967). 

2. Bellman, R. (ed. ) Muttitype Branching Processes . New York, 
American Elsevier, 1971. 

Rapoport, A., and Rebhum, I. "On the Mathematical Theory of 

Rumor Spread." Butt. Math. Biophysics, 1M :375— 383 (.1952) . 



3 . 



Automatic Classification of Indexed Monographs 

P. Zunde, P. J. Zando 

The hypothesis which has been under investigation is that iecisions 
involved in classifying monographs according to the Library of Congress 
or some other comparable classification scheme can be made on the basis 
of their internal (book) indexes and that a computer procedure can be 
designed to perform this task automatically. 

In the first phase of the research, one class was selected from 
each of four diverse subject areas of the Library of Congress classi- 
fication schedule , namely 

MATHEMATICS 
Algebra 

Abstract Algebra 

( 2 ) ECONOMIC THEORY 

HB 501 Capital . Saving . 



( 1 ) 

QA 266 



C3) 

HE 2751 



TRANSPORTATION AND COMMUNICATION 
Railways . United States 
History-. Statistics 



(4) PHILOLOGY. LINGUISTICS 

Language : General Works . Introduction . 
Philosophy. History, Comparative Philology 
Science of Language . Origin , etc . , of 
Language. 

P 121 General Works . Methodology . 



Random samples of three books were selected from each of the above 

categories and their indexes were compared for occurrence of identical 
terms. As a result of this comparison, similarity and analysis of 
variance values were calculated both within each of the four classes 



and between the samples of these classes. 

In the second phase of research, additional classes were chosen 
such that for each one of the original classes, two more classes from 
the same subject areas were chosen as close as possible in subject 
natter coverage to the subject matter of the original classes within 
the LC classification system (except for TRANSPORTATION and COMMUNICA- 
TION, in which area only one additional class was chosen). 





The additional classes were: 



CIA) 

QA 266 


MATHEMATICS 

Matrices 


ClB) 


MATHEMATICS 

Machine Theory . Abstract Machines 


QA 267.5 


Special types , A— Z . 


C2A) 


ECONOMIC THEORY 
Price. 


HB 221 


Theory of Price. 


C2B) 

HB 601 


ECONOMIC THEORY 
Profit . Income . 


C3A) 


TRANSPORTATION AND COMMUNICATION 
Railways . United States 


HE 2791 


By railroad Cor company; , A -a. 


C4A) 


PHILOSOPHY. LINGUISTICS 

Language : General works , etc . 

Science of Language , Philosophy , etc 


P 105 


1860/80- 


(4B) 


PHILOSOPHY. LINGUISTICS 

Language . General works, etc. 

Science of Language , Philosophy , etc 


P 123 


General Special (e.g. Analogy) 



As in the first phase, random samples of three books were taken 
from each of these additional classes and their indexes compared for ^ 
similarity both within and between the classes. Measures of similarity 
were then calculated for each pair of books . 

It was concluded that enough variance was present between indexes 
of books of even closely related classes and that the similarity Cor 
association) measures based on the co-occurrence of identical terms can 
be effectively used for discrimination purposes . 



O ' 


19 

24 



In -the Hair'd and Iasi phase of research, a profile for 1 each, of 
the eleven classes Cl) , CLA) , ClB) , C2) , C2B) , C3) , C3A) , C4 ) , 0+A) , 
and (4B) was constructed by cumulating fhe indexes of fhree sample 
books in each class and weighing fhe individual terms by fhe frequency 
of their occurrence in fhe indexes . If was hypofhesized ihaf fhose 
profiles are valid infensive descripfions of fhe above-named classes 
of fhe Library of Congress classificafion sysfem and fhaf fhese des- 
criptions are distinct and discriminative enough to be used as a 
decision criteria to assign books in terms of "best match 1 ' to LC 
classes by comparing their indexes with these class profiles. The ^ 
hypothesis was tested by arbitrarily selecting one book (not used in 
previous samples) from each of the eleven classes listed above and 
by using discriminant analysis to obtain similarity measures and 
confidence limits for maximum likelihood decision to assign those 
books to appropriate classes . The procedure is easily adaptable to 
a. full-scale computer processing and automation of classification. 

The results of the test validated the hypothesis to a highly 
satisfactory degree. 



Refe rences 

U. S. Lib rary of Congress. Subject Cataloging Division. 
C'lass'if'toa.'b'lon. Washington, D. C. , GPO . 



1 



On Scientific-Technical Tape Information Services 

V . Slamecka , J . Gehl 

The purpose of this study was to trace the outline of some 
of the main features of scientific- technical tape services which 
have developed during recent years , and to exhibit commonalities 
and variations of characteristics of those services . The data 
base utilized for this analysis was provided by Kenneth D. Carroll’s 
Survey of Sc-ien-t-if-ic-Teohn-ioal Tape Services Cl] . 

The principal sponsors of the fifty-five active, commercially 
available services listed in the Carroll report are learned and 
professional societies, publishing firms, and commercial organiza- 
tions. Virt ual ly all of the organizations involved use their data 
bases to produce one or more publications; these are bibliographies, 
indexes, abstracts, thesauri, keyword supplements, patent review* 
books , data books , or similar: products under different names .. 

The subjects covered Ibr "the services span almost the entire- 
range of scientific knowledge. (However, coverage is not equally 
balanced from subject to subyect ; chemistry and chemical engineer- 
ing , for example, are specifically covered by eleven different se=pe 
services.) The rather crucs lUeasuring unit developed for determining 
the amount of information provided by ihe services is the number of 
items cited per rape. Of the total number of services fear which 
information on this question i s available, approximately one half 
cite more than 5,000 source .fzems on each tape. Two of these in 
fact cite more than 20,000 such items. One is ICRS, the Index 
Chemicus Registry System tape, which cites 4,000 abstracts and 

17.000 Wiswesser Line Notations on each monthly tape, for a total 
of 21,000 items; the other is Predicasts Corporation's FSS Index 
of Corporations and Industries , which includes approximately 

25.000 source citations on each of its quarterly tapes. 

Considering the wide variety of topics covered by these 
tape services , it is not surprising to find that the number of 
tapes issued each year is quite different from service to service. 
Virtually every conceivable time interval is represented — weekly 
issues, three issues a month , biweekly issues, semimonthly , monthly , 
eleven issues a year, quarterly , every four months, semiannually, 
and annually. 

Combining the information available on both the average number 
of source items cited on a tape, and the frequency of tape issues, 
it may be concluded that almost half of the services cite more 
than 25,000 source items annually. Of this group, seven cite more 
than 200,000 items annually, and of those seven, there are two 
which cite more than 300,000. 



Of course, -the cost: of all "this informal ion is no! always 
low. However, more than 75% of the services are offered at annual 
cosls of $2,500 or less. 

A large portion of current data bases are devoled lo coverage 
of fhe jour nal liferature. More than three out of four services 
are such that 50% or more of their data bases are devoted to journal 
coverage, and almost two out of three are such that journal coverage 
ac counts for at least 80% of their total data base volume; a quite 
large percentage of this journal coverage is accounted for by 
English-language literature, and only one of the services surveyed 
indicated that its data base is predominantly Ci.e. , more than 50%) 
in a language other than English. 

Approximately one out of three of the scientific— technical 
services for which information is available indicated that at least 
some part of their data base is devoted to coverage of the reports 
literature, but only three out of twenty devoted more than 10% of 
their data base to such coverage. 

Also, nine services devote at least 25% of their data base to 
patent literature coverage ; of those nine , there are four which 
are devoted exclusively to that purpose. Finally, three of the 
scientific-technical tape services are devoted, exclusively to the 
coverage of papers presented at conferences-, and the data bases 
maintained by three others of the services are devoted to statisti- 
cal or historical data. 

Techniques for searching the various data bases differ con- 
siderably from one tape service to the next. Beyond such standard 
items as author, title, and basic bibliographic information, most 
services allow searching of the data base in various other ways. 
Thus, searchable data elements for one or more services include: 
descriptors (with or without links and roles ); keyword phrases; 
words in a document’s abstract; the language in which a document 
is written; primary and secondary subjects of a document; index- 
ing terms and title enrichment terms; and classification codes . 

With reference to approximately one out of three presently 
available tape services," a tape subscriber will be required to 
develop his own software . In such a case , the subscribing^ insti- 
tution will use the tapes strictly as additional input to its own 
system, and will use its own software and its own search strategy. 
However, in the remaining cases, the institution which produces 
the tape either already offers supporting software to its sub- 
scribers, or will develop whatever software is required for an 
interested customer. Also, some tape services have indicated 
that , although they do not themselves offer supporting software , 
various suitable search programs are available elsewhere on the 
commercial market . In addition, a number of the organizations 
producing scientific— technical tapes now offer, or are planning 
to offer, in-house search services — retrospective , SDI, or both. 



Of course, -the premise which underlies the utility and 
validity of any comparative survey such as the one resulting 
from this study C23 is the necessity and sufficiency of t 
parameters in terms of which such comparisons are made. How- 
ever^the 3 premise may be .unjustified; for we do not whether 

the parameters of comparison are useful for either of tme two 
mai orclients interested in surveys — those attempting to 
select the best service for their needs, and those seeling to 
reol several tapes for a -wider and more efficient service. Nor 
S we have any evidence that a much larger numner of parameters 
(such as prepared by Schwartz 133) can be employed to obstruct 
a decision-making algorithm for either category of potential 
users, ever, if one assumed the unlikely situation that su 
detailed descriptions of data bases can be obtained and made 

public . 

Thus, while paying attention to monitoring the characteristics 
of tape services , perhaps even more attention should s«s given to 
the idea of surveying the customers themselves , actual and potential 
A great deal more respect should be paid to the fJL 

recommendations of such customers, and to the fact lie^^in pro- 
liferating divers it v of technical design we are concerned with 
ifee management of information as an important national resource. 



References 

1 Carroll, K. D. Survey of Saientific-Technioal Tape Services. 
American Institute of Physics, September, 1970. 

2. Gehl, J. , and Slamecka, V. "A Profile of Scientific-Technical 
Tape Information Services." To appear in Proceedings of 
the NBS/COSATI Forum on Information Analysis Centers, 
National Bureau of Standards, Gaithersburg, Md. , May 1971 



3 . Schwartz , J . 

Systems, 



"A Check List for the Examination of Data Base 
» New York University, 1970. (Mimeographed.) 



Automated Structuring of Natural Language Text 
M- Valach 

Text structuring is a process by which the words aiad con- 
certs used in a given text T are cross linked using relations ^ 
given by both the grammar and the semantics of T. The following 
steps are essential to this process: recognition of tie words 

and” concepts of T; parsing and syntactical analysis air each_ 
sentence; and recognition of inter sentence relations — — which 
is the basis both for concept extraction (identification) and 
for development of text structure linkages. 

There are two points of view from which a text must be 
interpreted as a structure — — i . e . , ( 1) the writer * s viewpoint 
and (2) the reader 1 s viewpoint. 

1 . It is the writer of the text who is transmitting his 
information via text . The structure of the text imposed by the 
writer has to satisfy various requirements which in tnm depend 
on various circumstances. To name a few: 

(a) To co mmu nicate the content ox the text in a structure 
that comprises (directly or indirectly) all relations 
the author wants to give , where the elements of the 
structure are properly described, grouped, sand mutually 
related . 

(b) To relate his information to the knowledge of the 
assumed reader, so that the reader can more easily 
add the new information to whatever knowledge that 
reader already has . 

(c) To pro voice "side effects" in the reader, such as by 
arousing his curiosity, or providing satisfaction, 
or creating suspense, anxiety, of other emotional 
reactions . 

Thus, the writer structures the text according to: 

* What he wants to communicate; 

* Who he assumes the reader to be; and 

* How he wants to present his information. 

2 . A second interpretation of the text structure takes place 
from the reader's point of view. Indeed, each and every reader 
may interpret the text as being structured in a different way. 

A few of the factors influencing the interpretation of the text 
by the reader are: 



* What he already knows about the subject; 

* The type of person he is; 

* How much confidence he has in the writer; 

* His motivation for reading; and 

* The mood he is in at the time of the reading 





ERIC 



The role of Hie reader is not; only to recover Hie information 
content of Hie text as if is given by the meanings and relations 
among Hue words and concepts , buf also fo pro j ect or relate the 
content to his own knowledge, and this task comprises a substantial 
part of the process of interpreting the text. Thus, the writer’s 
structure of the content is modified during this process by the 
reader’s knowledge and the reader’s habits of learning, and so the 
procedure for recovering the content of the text is strongly moni- 
tored by the reader . 

Having this broad picture in mind, it can be seen Hiat there 
~i not just one simple structure into which a text can be convert 
.ed by the text-structuring process ; instead, text . structuring is 
merely one step in the interpretation (understanding) process, a 
step resulting - in a representation of the textual relations that 
can form a preprocessed data base for later interpretative pro- 
cesses. Yet the structuring itself has to be done with some 
particular purpose in mind indicating later use of the resulting 
structure . 

It would therefore seem clear that the main goal should be 
to make the resulting structure of the text rich and complete 
enough for as broad a subsequent use as possible. The structure 
then becomes a new data base for the content-interpretation 
algorithms, which can be modified or monitored according to the 
goals of the interpretation rather than being restricted to more 
special-purpose structures at the beginning. 

The text interpretation processes normally start with word 
classification and continue in the following steps : 

Word recognition and lexical descriptions ; 

Sentence parsing; 

Syntax of the sentence ; 

Context of the sentence; 

Concepts in the text; and 

Interpretation of the whole text in the context of 
the interpreter ’ s knowledge . 

In other words , the scheme looks as follows : 




WORDS -> PARSING -> SYNTAX CONTEXT -»■ CONCEPTS -* INTERPRETATION (1) 



It is well known that in analyzing the relations and structures at 
a lower level - - higher levels being in the direction of the arrows 
in (i) _ _ ambiguities can be found where the clue to their solu- 
tion lies in higher levels . It seems to us to be very important . 
to realize that the analysis made at a lower level is not sufficient 
if it results in merely the ’’most probable" structure, if in fact 



O 

ERIC 



25 






30 



more than one structure is possible . Our philosophy is that all ^ 
possible alternatives unsolved at the lower level should be retained 
until such time as the clue justifying either their acceptance or 
their rejection is found. This philosophy of approach provides 
the rrxDtiviation for certain requirements we have established for 
our algorithms . 

During the past year we developed the so-called Q— graph 
technique for parsing of sentences . The possible functional 
classes of the words in the sentence are matched with the Q-graph. 
The technique finds the parsing of the sentence - _ - including 
any existing ambiguities (.alternative interpretations ) . The 
matching process , a quite simple and fast one, can also be used 
as a nodel for interpretation of incomplete sentences • The model 
clarifies the basis for expectation, suspense, and satisfaction 
in the listener and diagnoses how the classification of the 
analyzed part of the sentence needs to be rebuilt if a. wrong 
assumption (wrong path in the Q— graph) was followed. 



The program for the parsing of simple sentences was ex- 
tended during the last year to the parsing of complex sentences 
which included relative sentences (clauses) embedded at different 
levels. The program recognizes the structure of the complex 
sentence, including the relative clauses, assigns a number to 
each sentence and outputs the corresponding tree of the numbered 
sentences . 

Another extension of the program which was developed transforms 
a given sentence of the type described into an equivalent set of 
simple sentences. Thus, Figures 1, 2, and 3 show _ an example of a 
sentence comprising five relative clauses . In Fag. 1 5 the analyzed 
sentence is shown in the second column; the fourth column shows 
the vocabulary lookup allocation of possible functional classes . 

For example, the word ’'coast" in line 22 can function either as 
a noun singular (noun s), as an adjective (adj) or as a verb 
(inf ini). The word "living " in line 14 can function either as 
a noun (noun s ) , as an adj ective (adj ) or as a present parti- 
ciple (-ing) . 

Figure 2 shows the analyzed sentence after parsing into an 
equivalent set of simple sentences. The second column contains 
numbers showing in which group of the Q— graph the word was found ; 
the third column shows the parsing ; the fourth column ^ shows the 
number of the simple sentence to which the corresponding word 
belongs . The remainder of the figure shows the analysed seiaeiice, 
where the words are shifted into different columns placing all 
words of the simple sentence into the same columns. For example. 



1 


THE 


1 


THE 






2 


PICTURE 


2 


NOUN S 


ADJ. 


INFINI 


3 


WHICH 


5 


RELACC 


RELNOM 




4 


WAS 


7 


WAS 






5 


GIVEN 


8 


PAST P 


PAST 




6 


TO 


10 


PREP. 


TO 




7 


My 


12 


ADJ. 


PRONOU 




8 


BROTHER 


14 


NOUN S 


ADJ. 


INFINI 


9 


WHOM 


17 


RELACC 






10 


My 


18 


ADJ. 


PRONOU 




11 


FATHER 


20 


NOUN S 


ADJ. 


INFINI 


12 


WHO 


23 


RELNOM 






13 


IS 


24 


IS 






14 


LIVING 


25 


NOUN S 


ADJ. 


-ING 


15 


IN 


28 


PREP. 






16 


CALIFORNIA 


29 


NOUN S 


ADJ. 


INFINI 


17 


WHICH 


32 


RELACC 


RELNOM 




18 


IS 


34 


IS 






19 


ON 


35 


PREP. 






20 


THE 


36 


THE 






21 


WEST 


37 


NOUN S 


ADJ. 


INFINI 


22 


COAST 


40 


NOUN S 


ADJ. 


INFINI 


23 


SENT 


43 


PAST 






24 


THE 


44 


THE 






25 


LAST 


45 


NOUN S 


ADJ. 


INFINI 


26 


MESSAGE 


48 


NOUN'S 


ADJ. 


INFINI 


27 


WAS 


51 


WAS 






28 


PLACED 


52 


PAST P. 


PAST 




29 


IN 


54 


PREP. 






30 


THE 


55 


THE 






31 


OFFICE 


56 


NOUN S 


ADJ. 


INFINI 


32 


WHICH 


59 


RELACC 


RELNOM 




33 


BELONGS 


61 


3PERS . 






34 


TO 


62 


PREP. 


TO 




35 


CHARLIE 


64 


NOUN S 


ADJ. 


INFINI 


36 


m 


67 


• 








Fig. 1 



27 

32 



1 


1 


THE 


1 


THE 






2 


1 


NOUN S 


1 


PICTURE 






3 


5 


RELNOM 


2 


WHICH 




4 


2 


WAS 


2 


WAS 






5 


2 


PAST P 


2 


GIVEN 




6 


3 


PREP. 


2 


TO 






7 


3 


ADJ. 


2 


MY 






8 


3 


NOUN S 


2 


BROTHER 




9 


5 


RELACC 


3 




WHOM 




10 


1 


ADJ. 


3 




MY 




11 


1 


NOUN S 


3 




FATHER 




12 


5 


RELNOM 


4 




WHO 




13 


2 


IS 


4 




IS 




14 


3 


NOUN S 


4 




LIVING 


15 


3 


PREP. 


4 




IN 




16 


3 


NOUN S 


4 


' * 


CALIFORNIA 


17 


5 


RELNOM 


5 






WHICH 


18 


2 


IS 


5 






IS 


19 


3 


PREP. 


5 






ON 


20 


3 


THE 


5 






THE 


21 


3 


ADJ. 


5 






WEST 


22 


3 


NOUN S 


5 


•* 




COAST 


23 


2 


PAST 


3 




SENT 




24 


3 


THE 


3 




THE 




25 


3 


ADJ. 


3 




LAST 




26 


3 


NOUN S 


3 




MESSAGE 




27 


2 


WAS 


1 


WAS 






28 


2 


PAST P 


1 


PLACED 






29 


3 


PREP. 


1 


IN 






30 


3 


THE 


1 


THE 






31 


3 


NOUN S 


1 


OFFICE 






32 


5 


RELNOM 


6 






WHICH 


33 


2 


3PERS. 


6 






BELONGS 


34 


3 


PREP. 


6 






TO 


35 


3 


NOUN S 


6 






CHARLIE 


36 


6 


• 


6 






• 




Fig: 2 





1 


1 


THE 


2 


1 


NOUN S 


3 


5 


RELNOM 


4 


2 


WAS 


5 


2 


PAST P 


6 


3 


PREP. 


7 


3 


ADJ. 


8 


3 


NOUN S 


9 


5 


RELACC 


10 


1 


ADJ. 


11 


1 


NOUN S 


12 


5 


RELNOM 


13 


2 


IS 


14 


2 


-ING 


15 


3 


PREP. 


16 


3 


NOUN S 


17 


5 


RELNOM 


18 


2 


IS 


15 


3 


PREP. 


20 


3 


THE 


21 


3 


ADJ. 


22 


3 


NOUN S 


23 


2 


PAST 


24 


3 


THE 


25 


3 


ADJ. 


26 


3 


NOUN S 


27 


2 


WAS 


28 


2 


PAST P 


29 


3 


PREP. 


30 


3 


THE 


31 


3 


NOUN S 


32 


5 


RELNOM 


33 


2 


3PERS. 


34 


3 


PREP. 


35 


3 


NOUN S 


36 


6 





ERIC 



THE 

PICTURE 

WHICH 

WAS 

GIVEN 

TO 

MY 

BROTHER 

WHOM 

MY 

FATHER 

WHO 

IS 

LIVING 

IN 

CALIFORNIA 

WHICH 

IS 

ON 

THE 

WEST 

COAST 

SENT 

THE 

LAST 

MESSAGE 

WAS 

PLACED 

IN 

THE 

OFFICE 

WHICH 

BELONGS 

TO 

CHARLIE 



Fig. 3 



29 



1 

1 

2 

2 

2 

2 

2 

2 

3 

3 

3 

4 

4 

4 

4 

4 

5 

5 

5 

5 

5 

5 

3 

3 

3 

3 

I 

I 

I 

1 

1 

6 

6 

6 

6 

6 

' h * 

Cj> 



34 



"The picture was placed in the office" is sentence number 1, 
while "whom my father* sent the last message" is relative clause 
number* 3 . 

Figure 3 shows alternative parsings of the sentence . The 
word on line 14 has possible parsings as a noun, adjective, or 
present participle . Acceptance of anyone parsing has to be . deter- 
mined by some knowledge found only outside of the sentence itself. 
Thus, consider the sentence: 

Happiness is living in California 

From this example we see that, if the subject is a state of the 
mind, then "living" functions as a noun. In constrast, if the 
subject is the name of a person, then "living" functions as a 
present participle . In our original sentence it is a knowledge 
of the meaning of the word "father" which is- the out— of— sentence 
clue to the proper interpretation. 

Accompli sliment s 

The name of the programming package which, materializes our 
approach to text structuring is Engttsh Text Pvoces si-rig (.ETP) . 

The ETP package is designed as a four-part package containing 
SEN, KNO, TEX and MOD, which will be described below. 

Table 1 shows the full names of the packages and their 
interior groupings , including the names of corresponding sub- 
programs . Asterisks mark those subroutines which have already 
been written, debugged and tested. Plus— signs indicate sub- 
routines which have not yet been programmed but which have 
already received some consideration in our general concept of 
text processing. 

Figure 4 shows how the different parts of ETP are mutually 
related. It is assumed that all of these use the same data base, 
with each contributing to its growth in its own way . 



Table 1 

SEN Package 





SENTAL 


ESR(*) 

ESVLC-0 

GRAPARC*) 


English Sentence Reader 1 

English Sentence Vocabulary Lookup 

Q-graph Parser 1 of English Sentence 




SENSTR 


PSTC-0 


Par-sing to Synthax Transformation 




SENIN 


SSS(+) 


Simple Sentence Separation 


TEX 


Package 








TEXAL 


ISEL 


Inters entence Linkup (resulting from 
strictly grammatically worked relations) 


- 


TEXSTR 


AMEL 


Amibiguity Elimination (resulting from 
textual relations) 






TEXPRO 


Text to Knowledge Projection 






TEXIN 


Text Interpreter 






TEXSYN 


Text Synthax from Model (+) 
observation 






TRAPERC+) 


Transformation from One Person 
to Another 


MOD 


Package 








MODBU 


Model Builder 




MODAL 


Model Analyzer 1 




MOMA 


Model Manipulation 


KNO 


Package 







KNOCO Knowledge Collection 

KNOUP Knowledge Update 

KNOSTR Knowledge 



ESR 



/N 



ESVL 

l 

GRAPAR ^ 

X sss 

PST ^ 



SEN 



\y 





Fig- 4 
32 




The ESR subroutine (Engl ish Sentence Reader) was written, debugged 
and used in various other programs in which English Sentences are in- 
putted .for a naly sis. ESR separates words and markers for vocabulary 
lookup and further formating of the sentence items. It reads sentences 
from tape or cards . 

The ESVL subroutine (Eng l ish Sentence Vocabulary Lookup ) , incor- 
porated into GRAPAR, performs ondexical classification of sentence items 
(words and punctuation marks), identifying possible classes in which 
the particular word (in the actual form used in the sentence) may 
function. ESVL is a preprocessing subroutine for the Q-graph parsing 
method, used in GRAPAR . 

' GRAPAR (Q-Graph Parser for English Sentence) accomplishes the 
following ; 

1. Parses each sentence of the text from its beginning to the 
next period: 

2. Allocates a number to each simple sentence (clause) that is 
part of a complex sentence or that is a relative embedded 
sentence ; 

3. Constructs the tree structures in which each node corresponds 
to one sentence number (allocated, under 2) showing the struc- 
ture of simple and relative embedded t' entences of the analyzed 
sentence; and 

4. In c as e, there are more possible parsing alternatives Ci.e., 
ambuiguity resulting from different possible parsings) the _ 
analysis processes each alternative equally (as described in 
points 1,2 and 3). 

GRAPAR is the main subroutine of the group SEN which parses sen- 
tences by using the Q-graph technique Cl3. A Q— graph is stored in the 
corresponding arrays in the memory and is considered to be part of input 
data for the GRAPAR subroutine. 

PST (Parsing to Syntax Transformation) , a subroutine that transforms 
a parsed sentence into syntactical structure (more precisely , into a 
diagram of the sentence SD) is under development . A special feature of 
PST that is being attempted is one which would derive the diagram of the 
sentence DS by transformation of the Q-graph into a corresponding sen- 
tence diagram graph called an SD-graph. If a successful, content— inde- 
pendent transformation can be found (as hoped), then very simple, straight- 
forward, and fast rules will be the result. At this time, however, the 
Q to SD transformation is not yet completed and will require further 
research effort. 



SSS C Simple Sentence Separation) subroutine processes "the tree 
structure T of simple sentences (including relative sentences) out- 
putted from GEAPAE* The end result is the set ST of simple sentences 
comprised in the original sentence OS . 

The set ST of simple sentences together with the tree structure T 
contains all information necessary for the recovery of the originally 
analyzed sentence OS . Therefore ST and T are considered to be a result 
of equivalent transformation of the original sentence OS . The program 
demonstrates that both of the following transformations are possible - - 
either : 

OS T, ST (automated) 

or: 

T, ST -*■ OS (not yet automated) 

PERTRA (Narrating Person Transformation) . A preliminary study 
has been conducted to collect the rules which govern the transforuiation 
of narrative text from one person to another. The situation is as 
follows : Having a text , narrated by one person PI , replace the person 

by some other person P2 from the text and have the text transformed as 
being narrated by P2 . The interesting aspect of the situation is that 
the transformation req uir es what we have called a ’’model of the 
situations” brought out by the content of the text: The model helps 

to determine the proper relations between P2 and other described ob- 
jects which are to be described or also transformed under the changed 
situation. 

A table has been established for grammatically correct context-free 
transitions , and progress on the study continues . 

In conclusion, we will recite some particularly interesting topics 
which are related to the described research and which offer excellent 
subjects for theses , seminars, or articles: 

* Need of a ’’situation model" for text interpretation 

* Two-step parsing using Q-graph 

* Proposal for measuring the suspense created by the 
unfinished part of the sentence (using Q-graph 
approach) 

* Development of patterns for learning the structure of 
the English sentence using Q-graphs 

* Q— graph subgraphs as simplified English for the man- 
computer interface 

* Change of the person narrating the same text 

* Problem of the text syntax, created from the observation 
of the behavior of described subject 

* Sentence as a vector of independent components , and 
its role in the text structuring process 




34 



33 



Reference s 



1. Valach , M. "A Q-graph Approach to Parsing. " To appear in: 

Pro oeed'ings of the Conference on Liyigui st-ics , University 
of Iowa, October, 1970. 




35 



40 



Study Of Concept-Based Grammar's 

D . Roger’s 

The purpose of this study has been to provide Insight 
towards synthesis (e.g., automatic abstracting) of natural- 
language information • That the internal structure of the 
sentence or smaller vjit depends on the organization of the 
entire text has become increasingly evident to linguists (e.g., 

Zellig Harris , Kenneth Pike, and Robert Longacre). This study 
has partially analyzed the following interdependent features 
of a text : topics and comments , analyses of verbal activities , 
nominalizations of verbal activities , references to events 
and references to nominal s , and the identification and range 
of various types of modalities . 

Xn a text , a topic may be loosely defined as something 
being focused on (i.e., a subject) and a comment may be defined 
as a predication of that topic. For example , in an active- 
sentence construction such as the government recalls the am — 
bassador the topic is the agent the government and the comment 
about the government is the pr edicate recalls the ambassador . 

On the other hand, in a passive-sentence construction such as 
the ambassador was recalled by the government ., the topic the 
ambassador is the ob j ect of the verbal activity end the comment 
was recal led by the government contains within it the lexical 
expression of the agent of the verbal activity . 

We should wish that a grammar adapted to synthesis of 
natural language information be at least able to recognize 
topic and comment in terms of the placement of agent and 
object markers. We can demonstrate the working of such a 
grammar in respect to the following , which represents a 
further analysis of the above two sentences . 

COMMENT 

recall s the ambassador 

{agent } (present) [ object 1 

£ ob j ect } £ agent ] 

COMMENT 

is recall ed by the government 

{ agent} (present) 

{ ob j ect } [ ob j ect ] I agent J 



Cl) 






the government 



(2) TOPIC 



the ambassador 




41 



36 



In -this notation, the verbal primitive recall (or, more precisexy, 
the verbal root call with the particle re- ~ ir back") is marked in 

the lexicon 1 with the possibility of opening slots for the grammati- 
cal expression (marker) of the agent and of the object. This is 
signified in the notation by the braces around agent and around 
object beneath recall. Thus, this, the first stage in the explan- 
ation of either (1) or (2) is the following: 

(3) VERBAL ROOT 

recall 
{ agent} 

{ object} 



A set of rules concerning verbal roots allows selection of a 
particular affix '.form) to accompany the verbal root. However , 
the rule placing an affix also specifies a semantic interpretation 
on that affix. One of a set of modal affixes expressing semantic 
features such as various tenses, possiblity , necessity, question,; 
condition, permission, command, etc., is obligatorily placed after 
the verbal root. In the analyses of both (1) and (2) a^marker,/ 
let us say LI, is placed after the verbal root in (3) with the 
semantic interpretation of reference to the present . This reference 
is indicated in the notation by parentheses : 

(4) VERBAL ROOT 

recall 
{ agent} 

{ object} 

s placement specifies (a) the modality present associated 
with the verbal activity involving the verbal root recall and <b) 
the potentiality for opening up a slot (or position) for the lexi- 
cal representation of that feature (e.g. , today , now, etc., as 
opposed to yesterday ) . 



LI 

(present ) 



1. This may be considered analogous to the lexicon marking in 
Fillmore, Charles J. , ’"The Case for Case,” Un-iversaZs in 
L-ingu-is ~tZc Theoy^y ed . by Emmon Bach and Robert T . Harms 
(New York: Holt, Rinehart and Winston, 1968), pp. 1-88. 

However, an alternative analysis may be that the rules 
defining such entities as agent , ob j ect , recipient , etc . 
are pragmatic in nature. 



37 



To this point: the analyses of both Cl) and (2) are tne same. 
However 1 , a choice may be made "to replace the marker LI by s_, a 
menioer of a set of finite-verb endings , to grammatically express 
(indicated in the notation by square brackets) the third-person 
singular agent as in (5) below or by ed, also a member of the 
set of finite-verb endings , to grammatically express a tlrird- 

2 

person singular object as in (6) below . In both instances the 
reference to the present tense is retained. 



(5) VERBAL ROOT 

recall 
{ agent} 

{ object} 



s 

(present ) 

I agent ] 

1 3rd person] 
E singular ] 



(6) VERBAL ROOT 
recall 
{ agent} 

{ object} 



ed 

(present ) 
[object] 

I 3rd person] 
[ singular] 



In the string (6) above the combination of (present), i object] , 
[3rd person] and [singular] obligatorily cause the placement of 
an is before the verbal root : 

(7) VERBAL ROOT 

is recall ed 

I agent} (present) 

{ object} [object] 

[3rd person] 

[singular] 

Because the verbal root in the above is followed by a finite-verb 
ending, the string recalled is named a word. Likewise, the string 
in (5) is named a word because it ends in a finite-verb ending. 



Person and number are specified by sets of rules operating 
together to select (or analyze) an affix. 



38 




2 . 



Because "the form s_ has been placed f o granmnatically express 
•the agenf in (5), bhe lexical represenbabion of the agenf musb be 
placed in a slob before bhe verbal unib : 

C8) bhe government recall s 

{ agenb } (present) 

{ object} C agenb ! 

[3rd person J 
[ singular 3 

We nobe bhab in bhe above bhe verbal roob is narked wibh bhe possi- 
bility of opening nob only a slob for an agenb bub also a slob for 
an ob j ecb . Because a form has nob been placed bo express bhe ob j ecb 
in < 8 ) , bhe lexical representation of bhe ob j ecb is placed in a 
position after bhe verbal unib as in (9) below. We nobe bhab now 
both bhe possibility for opening a slob for bhe grammatical expression 
of bhe agenb (i.e. , bhe slob s of recalls as opposed bo bhe lexical 
represenbabion bhe government and bhe possibiliby for opening a 
slob for bhe grammatical expression of bhe objecb (i.e. , bhe slob 
afber bhe verbal unib) associated wibh bhe verbal roob have been 
This complebes bhe expl anabion of (1). 

bhe government recalls 

{ agenb} 

{ objecb} 

(.present ) 

[agenbj 
[3rd person! 

[singular! 

Because bhe form ed has been placed bo express bhe objecb in 
C7), bhe lexical represenbabion of bhe objecb musb be placed in a 
slob before bhe verbal roob : 

(10) bhe ambassador is recall ed 

{agenb } (present) 

{objecb } lobjecbj 

£ 3rd person! 

[ singular! 

Because a form has nob been placed bo express bhe agenb in (10 ) , 
bhe lexi cal represenbabion of ’the agenb wibh a preceding by is placed 
in a position afber bhe verbal unib. Thus, bhe following complebes 
bhe explanab ion of (2): 



bhe ambassador 
[objecb! 



satisfied. 

(9) 



(11) -the ambassador' is recalled by "the government 

I agent: } C agent j 

{ object } 

(present: ) 

Cobjectll 
C3rd person!] 

C singular] 



In both (1) and (2) -the bopic is defined as that entity, 
involved in the verbal activity, which has been expressed by the 
finite— verb form placed immediately after the verbal root and 
lexically placed in a position before the verbal unit. If, 
instead of selecting the finite— verb form s_ in ( 5 ) or ed in 
(6) to replace the Ll of (4), a non-finite verb ending, e.g. , 
ing with reference to an agent , is selected , we obtain (12 ) 
instead of (6). The rule retains the reference to the present 
specified by the placement of the affix Ll - 

(12) VERBAL ROOT 

recall ing 

{ agent } (present) 

{ ob j ect } ( agent ) 

Because neither a reference to the. object nor the grammatical 
expression of the object has been placed, the lexical representation 
of the object is placed in a position after the verbal unit as in 

(13) below. 

(13) VERBAL UNIT 

recall ing the ambassador 

{agent } (present) C object ] 

{object } (agent) 

Because "the reference to the agent has been placed (i.e. , in 
the replacement of ing for Ll in (12 ) ) , the lexical representation 
of the agent must be placed in a slot before the verbal unit as in 

(14) below. 

(14) the government recalling 

{agent} 

{object } 

(present ) 

(agent) 



40 



45 




the ambassador 
Cobjectll 



We note that in the above structure the topic is formally 
defined as that entity, involved in the verbal act-'.*’ity, which has 
been expressed by the form placed immediately afte. r e verbal 
root and lexically represented in a position before the verbal 
unit. The structure (14) may be placed in initial position where 
the string the government in (14) is the lexical representation 
of an entity involved in another verbal activity having a struc- 
ture similar to (1) or (2). If the form placed immediately after 
the verbal root, e.g., declare , of this other activity expresses _ 
the function of the lexical representation the government in this 
other activity, the string (14) is placed in initial position witn 
appropriate commas as in (15) and the string the government is 
the topic of both verbal activities . 

VERBAL ROOT 

(15) the government, recalling the ambassador, declare s 

{.agent } (present) 
{object } [agent] 

[3rd person] 
[singular] 



war 

[object] 



However, if the form placed immediately after the verbal 
root declare grammatically expresses the object, then the 
structure Cl4) is placed with a preceding by after the verbal 
unit declared as in (16) below. This provides for two topics, 
the war and the government , with the additional information 
that the second topic is included within the comment about the 
first topic . 

(16) TOPIC COMMENT 

TOPIC 

war is declare d by the government , 

{agent } (present) [ agent J 

{object} [object] 

[3rd person] 

[singular] 

COMMENT 

ing the ambassador 

( present ) [ ob j e ct ] 

(agent ) 



recall 

{agent } 

{ ob j ect } 






46 



in 



Besides formally defining topics and comments, this procedure 
yields analyses of verbal activities expressed in surface strings. 
Such analyses, stated not in terms of surface strings , permits the 
recognition of the same verbal activity repeated as different sur 
face forms in other portions of the text . 



The grammar that is capable of handling natural language text 
in the way demonstrated above is constituted a set of conditional 
rules and meta-rules concerning the order of the selection of --he 
conditional rules. These conditional rules place the potentialities 
of the verbal root, such as the nodal characteristics tense, possi- 
bility 7 , necessity, question, condition, etc., and the grammatical 
expressions of the entities agent, object, recipient, etc., involved 
in the verbal activity. The meta-rules concerning the order of 
selection of conditional rules control the various manifestations 
of the verbal roots and nominal entities , thereby monitoring e 
deployment of meaning on the surface . 

As can be seen by the above description, such a grammar would 
provide a facility which has the capability to synthesize information 
carried by natural language text. Such a facility is able to asso- 
ciate the relevant semantic interpretations with the surface strings 
of the texx . 



References 

1. Harris, Z. S. ’’Discourse Analysis . ” Lar,guage 3 28;l-30 

(1952). 

2. Harris, Z. S. ’’Linguistic Trans format ions for Information 

Retrieval.” In: Proceedings of the International Con- 

ference on Scientific Informat-ion (1958) 3 Vol. 2. 
National Academy of Sciences - - National Research 
Council, Washington, D. C. , 1959. pp. 937-950. 

3. Pike, K. L. ’’Four Substitution Devices for Studying the 

Relation Between the logical and the Grammatical 
Structures of Discourse.” Paper presented at 1971 
Summer Meeting of the Linguistic Society of America , 
State University of New York at Buffalo, July 29-31, 
1971. 




47 



42 



Toward a Th eory of Mechanica l Problem Analysis 

M. Valach, H. J. Eiden 

The object of This research has been fo establish processes 
for carrying out the analysis of diverse classes of problems in 
terms of an arbitrarily selected set of problem solving procedures 
from various disciplines . The nature and purpose of problem analysis 
in this case, is toward the possible production of a solution program 
based on some subset of the selected procedures . 

A "Problem Analysis Machine" (PAM) is postulated which is meant 
to produce solution programs for problems given by a nypothetical 
user (U). The actual execution of resultant programs is left as the 
task of U. It is assumed that U is capable of executing a finite set 
of procedures , part of which are primitive (irreducible to simpler 
procedures), and the rest formed by various compositions of the prim- 
itives. The procedure set is called tt. PAM is then said to have tt- 
knowledge, i.e., for each member of tt PAM has stored a description 
of the initial situation for which the procedure is normally elicited 
and a description of the resulting situation following procedure 
application. In addition to the pair of descriptions called 13 and 
GS , there is sufficient description of the procedure itself to dis- 
tinguish it from other procedures laving the same IS . and GS , and a 
particular name by which the procedure is referred to. A particular 
member of tt -knowledge is called a transition rule and is symbolized 
by a quadruple (IS, GS , M, t • ) , the elements of which refer to the 

type of descriptive information mentioned above, respectively. 

For any member of the transition rules { t. } , the descriptive 

information of each IS, GS, and M is symbolized by (E.R, A^,, A^) 

and is called X -description. Such a quadruple consists of a 
naming of elements; relations that hold between these elements; 
attribute values assigned to the elements ; and attribute values 
assigned to the relations. The transition rules are clustered 
in it -knowledge according to discipline; each such cluster is 
called a ir -domain. Although overlapping, it -domains are dis- 
tinquished by their A -vocabularies; i.e., by the element, relation, 
and attribute names used in transition rules . 

A problem statement is given to PAM as a triplet CIS, GS , C)p, 
where each member of the triplet is expressed in A -format . C is 
an expression of constraint set on M and/or any intermediate situa- 
tions arising out of the serial application of t^s , It is the task 

of PAM to attempt to construct a solution program for a given prob- 
lem statement in terms of { t }. Such a program may include branch- 
ing, looping, or recursion, depending v.p-'n the richness of the set 
{ t^} . A solution program defines a method of transformation of an 

of an ISp to a GSp under 1 the constraints C. 



To accomplish its task, PAM executes a problem analysis program 
which operates on the problem statement and it -knowledge , with the 
aid of what is called a reinterpretation dictionary , a picture maker , 
and a collection of rules of inference { r^}. In its simplest mode 

of operation, the analysis program proceeds much like one would carry 
out a derivation in formal logic*, i . e . , t^s are selected and applied to 

ISp or 'inversely to GSp in a manner like the application of theorems 
or rules of inference . In more advanced modes , PAM performs either’ 
description- improvement or domain-shifting. Both processes are per- 
formed on any of the members of (IS, GS, C)p, any of IS^. , GSk. , 

i i 

and Mt^, or on any intermediate situation description occurring in 

the midst of application of a series of t . s . Description improvement 

may be obtained with the use of any of the three aids listed above. 

In this case the reinterpretation dictionary provides a listing of 
equivalent terms in any particular A -vocabulary. The picture maker 
accepts an old description, converts it into a whole graphic image, 
accepts the names of new elements and relations , and attempts to 
find these new features in the image ; hence modifying the old descrip- 
tion. The rules of .inf erence are clustered in direct correspondence 
with tt -knowledge, and represent the theorems and inference rules by 
which one situation may be implied from another in a given discipline. 
These rules are expressed as triplets (IS, GS, r* ) , whose members are 

expressed the same as those in the t • s of the particular domain. 

Domain-shifting is performed for the express purpose of changing the 
description of a situation, constraint, or M to that of some corres- 
ponding entity in an alternate disciplinary domain. The suggested 
alternate domain is indicated by a correspondence of vocab ular y in 
the reinterpretation dictionary, or by a requirement reflected in 
the user’s problem statement; i.e., ISp, and GSp are classified in 
different domains . A domain-shift may be carried out by one of two 
methods. On the one hand domain-shifting rules may exist in { t. } 

which take one across domain boundaries; on the other s where such 
rules do not exist, the analysis program performs what is, called 
’’coding.” Coding is simply the process of mapping the names (ele- 
ments, relations, attributes) of all or part of one description onto 
those of another, where such a mapping may or may not be one-to-one. 

^he overall purpose of domain-shifting is twofold. It is invoked in 
one case v.ien a solution is allowed to be multi discipline ; in the 
second case. It is used to discc-wer solution programs in an alternate 
domain that exhibit by analogy the solution program structure required 
in the domain of. interest. 

The goals of this research do not include the actual design 
the system outlined here; rather the intention is to establish, anc 
substantiate necessary and sufficient processes. Complete documenta- 
tion of this work will appear in a forthcoming Ph.D. dissertation E 3] . 

44 



0 




49 



References 



1. Amarv.l , S. ,r 0n the Representation of Problems and Goal- 

Directed Procedures for Computers . " Communications of 
the American Society fox 3 Cybernetics, Vol. I, No. 2, 
1969, pp. 9-36. 

2. Banerji, R. B. Theory of Problem Solving: An Approach to 
Artificial Intelligence. New York, American Elsevier, 
1969. 

Eiden, H. J. Toward a Theory of Mechanical Problem Analysis . 
Atlanta, Ga. , Georgia Institute of Technology. Ph.D. 
Thes is . ( Forthcoming . ) 



3 . 



STUDIES IN INFORMATION PROCESSING AND SYSTEMS DESIGN 



Interactive Preparation of State— of -Art Reviews 48 

Extending the Utility of Science Information to Education 52 

An Adaptive Spectrum Analysis Vocoder- 55 
Computer Picture Processing 57 

Computer Structure for Description of Pictures 59 

Optimal Simultaneous Flow in Single Path Communications Networks 
Pre— scheduler and Management Model for Computer— User Systems 
Multiprogramming Scheduling 67 

Problems in Operating Systems Design 69 



Interacti ve Preparation of State— of— the— .Art Rsviews 

P. J. Siegmann, D. E. Rogers, 0. Reutter 

In response to the rapidly expanding volume of new technical 
literature produced in recent years , means have been sought for 
ttv^Tci* ng literature reviewing resources more effective. One rich ^ 
means would be the development of automated tools to assist reviewers 
during the analysis and composition phases of reviewing , . so that 
computerized aids could serve to eliminate the more clerical tasks 
of reviewing and thus free the reviewer for the more creative aspects 
of his role. In order to investigate the feasibility of implementing 
an automated approach to the writing of review articles , we have in 
this research effort specified preliminary designs for a prototype 
system. 

The prototype system has the following features: on-line access 

to a central file of technical literature for purposes of review; 
interactive text— editing facilities for constructing reviews and 
abstracts via CRT terminals; a facility for temporary fifing of 
partially constructed reviews ; protection of source documents and 
working reviews from unauthorized modification; and recording of 
actions taken by the reviewer during the reviewing process. 

The suggested system consists of a CRT terminal (.for document 
display, text entry and text editing), a central processing unit, 
a secondary storage file system, and a set of programs. The system 
has four modes of operation: retrieval mode, print mode, marking 
mode and editing mode. Retrieval mode provides the terminal user 
the ability to retrieve any document to which he has authorized 
access; once retrieved, depending on the class of document, a 
document may be marked, or edited and then refiled as modifxed. 

Print mode is used to print one or more copies of any formatted 
document at any on-line printer terminal. Marking mode is used 
to mark key words or phrases and to enter commentary among the 
text of an existing document . Editing mode is used for entering 
new documents, formatting or editing existing documents, or re- 
arranging text in existing documents. 

There are two types of storage in the system: file storage 
and working storage • File storage is Uoed to permanently ^ store 
documents. Working storage exists only while a reviewer is using 
the system; a block of such storage is dedicated to each terminal 
user while he is connected to the system. The working storage 
block, is divided into one partition containing structured lists 
and another containing the actual document text. 




48 






Software for 1 the prototype system is comprised of nine major 
components (see also Fig. 1) * 



* Communication Control (for terminal interaction, 

page buffering) 

* Text Parser (for recognition of commands , syntax 

checking) 

* Command Handler (for subroutine sequencing, posting 

to the history file) 

* Marker Mode Handler (for control of commentary entry 

and keyword marking of source document) 

* Document Editor (for performing editing functions 

net provided by terminal) 

* Document Formatter (for performing formatting 

functions specified by terminal operator) 

* Print Control (for providing off-line printing 

functions, such as reformatting , suppression 
of embedded control numbers and format state- 
ments , etc . ) 

* File Interface (for filing of new documents , protection 

of documents in secondary storage , and retrieval 
of documents) 

* Document Transformer (for construction of working 

storage list structures during read- in of a 
document to be edited) . 



Formatted documents are structured into sections , paragraphs 
and sentences. Sections are uniquely numbered within each docu- 
ment , paragraphs within each section , and sentences within e ach 
paragraph. Working storage list structures consist of a section 
list with forward linked sections , forward linked paragraphs , and 
forward linked sentences as shown in Fig. 2. Each sentence entry 
contains a pointer to the core location of the first character of 
texb for that sentence . 




49 



53 




Fig. 1. Schema-tic of Prototype System Software 




Fig. 2. List: Structure for Working Storage 




54 



50 



The user ter. urinal proposed for the prototype system is a CRT 
device which handles most editing functions in a local mode. For 
laboratory purposes it would be desirable to have a programmable 
Chf memory, since this would allow experimentation with variations 
in the editing functions to be supported by the device . The display 
buffer is desired to be large enough to allow both rollup and rolldown, 
within reasonable limits . The screen should be capable of displaying 
a minimum of 1,800 characters of text in a single display. 

Estimated hardware costs of the prototype system are $250/hour 
for CPU usage, and $5,000 for a CRT terminal. Estimated programming 
requirements for designing, coding and testing the nine software 
modules (containing an estimated 10,000 source statements if written 
in a high-level programming language with the facilities of PL/1 ) 
would be two man-years . 

The study concluded that the prototype costs are prohibitive 
at this time. 



R eferences : 

1. Reutte - 1 , J. "A Study of the Feasibility of Implementing a 

System for the Interactive Preparation of State-of-the- 
Art Reviews . " Unpublished memorandum, Georgia Institute 
of Technology (School of Information and Computer Science) 
Atlanta, Ga. , 1971. 




51 

55 



Extending -the Utility of Science Inf orrmat ion -to Education 

V. Slamecka, A. P. Jensen 

In attempts to extend the utility of science information data 
bases, the Georgia Tech Science Information Research Center pursues 
the potential of using such automated data bases in the process of 
self-instruction. The initial technical prerequisite has been the 
design and empirical evaluation of a prototype of a "knowledge 
utility" as a mechanism for the delivery of a nontrivial portion 
of the educational requirements . This prototype utility for self- 
instruction has been given the name "Audiographic Learning Facility 
(ALF)." 

The concept of a self-instruction system, principally charac- 
terized by the absence of the live instructor as the primary and 
formal transmitter of knowledge , is shown schematically in Fig . 1 . 




Fig. 1. The Self-Instruction System 

The major components of this system, are an inanimate, structured 
Memory for storing learning materials in a form suitable for trans- 
mission and for perception by remotely located learners , and a 
programmed Preceptor controlling the transmission. 

The control over the process of self-instruction is partially 
vested in the programmed Preceptor , and in part it resides with the 
Learner. User- imposed -control over the system is of two types. 
On-line control gives Learner the ability to start, stop and repeat 




52 

56 



a presentation, and to jump at any time to any other learning unit 
in the system. Using these commands. Learner can override the se- 
lection of learning units offered by Preceptor , and in such a 
manner participate, on-line, in the design of his learning strategy. 

The second control mechanism interposes between Learner and Preceptor 
the services of a human tutor; it is tantamount to an appointment 
or a conference with a teacher prior to overriding the programmed 
Preceptor. Incurred in this type of control v?ill usually be a time 
delay . 

The self-instruction system operates in two modes : scheduled 
and on-demand. Both modes of self-instruction can serve, optionally, 
either group audiences (e.g., a class) or individual learners. 

The basic distinction of this pilot self-instruction system 
from other mechanized learning systems is in its storage of narrative- 
speech and line repllic ’'blackboard" lessons as the modular contents 
of Memory , and in rts capability of actively involving Learners in 
the design of their learning strategies . The communication between 
Preceptor and Learner , and the transmission of audiographic learning 
materials , employ standard telephone lines . The implemented hardware 
system of the Audiographic learning Facility has a capacity of ap- 
proximately 120 hours of audiographic lectures, and it supports four 
remote, on-line learning sites. A limited version of the Preceptor 
software has been written. 

Guidelines have been issued for the preparation of educational 
materials in audiographic storage form relative to (a) the identifica- 
tion of major concepts and learning goals, employing a directed graph 
approach; (b) the decomposition of major concepts into lessons and 
strategies, using precedence graphs to represent the latter; (c) 
lesson writing and recording; (d) preparation of introductory lessons 
explaining structures of learning concepts and strategies for particular 
learning goals; and (e) indexing learning materials for the purpose 
of updating the syndetic data-base aids . 

Several introductory courses have been recorded by this method 
during the past year, including two courses on computer organization 
and programming, a course in discrete structures, and one in introduc- 
tory cybernetics. Additional courses under recording at this time cover 
the subjects of environmental technology, world civilization, and a 
comparative study of programming languages . 




In its more advanced form, the Preceptor is itself a learning, self- 
organizing system striving to optimize its functions on the basis 
of certain categories of feedback/ commands received from Learners . 
Among its other functions are monitoring Learner performance and 
collecting appropriate data useful for the management of the system. 

53 



i > -v * 



57 



During "the past year the Audiogrsphic Learning Facility has 
been tested empirically as the primary medium of delivery of learning 
materials in a six-week Summer Institute in Information/ Computer 
Science for High School Teachers. During these six weeks, 20 high 
school teachers have used ALF in a self-instruction (group) mode for 
four consecutive hours per day, followed by four-hour periods of 
discussion and tutoring daily. Several methods of lecture recording 
have been tested: recording prior to presentation has averaged ap- 
proximately 3 non-hours of faculty time per hour of recorded instruc- 
tion; recording in live classroom consumes little more than the time 
of the live lecture. 

This empirical , real-world use of ALF has been exceedingly 
successful. A remote application of ALF will take place in Fall 1972 
between Georgia Tech and West Georgia College in Carrollton, Ga. 

Among the key objectives of this project, as delineated in the 
Annual Progress Report for 1969/70, is the study of the extended use 
of science information banks for the semiautomatic creation and up- 
dating of "knowledge utilities" for education. The project will be 
approaching this study phase in the coming year, and will seek to 
devise methods of semiautomatic mapping of information from a science 
information bank into ALF. 



References 

1. School of Information and Computer Science. Self -instruction 

Systems: An Alternate Soaio-Tedhnological Approach to 
National Education and Training . Atlanta, Ga. , Georgia 
Institute of Technology (School of Information and Com- 
puter Science), 1971. Research Report and Prospectus, 13 p 

2. Slamecka, V. , Jensen, A. P. , Valach, M. , and Zunde, P. "A Com- 

puter-Aided Multisensory Instruction System for the Blind . " 
Submitted for publication. 

3. Slamecka, V. , Jensen, A. P. , and Zunde, P. "An Audiographic 

Repository of Knowledge for Conversational Learning." To 
appear in Educational Technology. 










54 









An Adaptive Spectrum Analysis Vocoder 



J. C. Hammett 

The objective of this research was to improve the performance 
of nodem speech bandwidth compression systems the properties of 
-the vocoder (voice— coder) were examined 5 a potential improvement 
was proposed and incorporated into a vocoder design, and the 
performance of the resulting system was evaluated by computer 
simulation Cl] . 

The phonemes of speech display a wide range of time-frequency 
properties, due to the extremes in the articulatory- dynamics of 
speech production. The vocoder* is based on the simplified model 
of speech production, an essentially ''stationary” model. The 
relative validity of the model may be improved by matching the 
duration of the analysis window function to intervals of speech 
which are indeed "stationary.” These observations motivated 
the design and experimental evaluation of a vocoder which adapts 
its time— frequency resolution properties to match the relative 
stationarity of different segments of input speech. 

The homomorphic vocoder was selected as a test platform to 
evaluate the adaptive spectrum analysis strategy . The homomorphic 
vocoder was a natural choice for the simulation because its time and 
frequency properties may be readily manipulated — — time resolution 
by the duration of the analysis window function , and frequency res- 
olution by the number of cepstrum coefficients transmitted. 

An adaptive homomorphic vocoder was designed and a simulation 
system implemented on a large-scale digital computer. Experimental 
runs with the adaptive homomorphic vocoder were made with three 
test sentences , and the synthesized speech was judged in informal 
subjective listening tests . In one experiment (with a female 
talker) two adaptive modes were employed with window durations 
of 12.8 or 25.6 ms, frame intervals of 10 or 20 ms, and cepstrum 
truncation to 10 or 20. coefficients , respectively . The spectrum 
data rate was reduced to 3700 b/s and the synthesized speech judged 
to be of high "quality, " retaining naturalness and recognition 
properties . 

Two additional experiments (with maleitjalkers ) used window 
durations of 10 or* 20 ms and a 3700 b/s data rate. The first 
of these res ult ed in synthesized speech judged to be of high 
"quality” but slightly less natural than the earlier result. The 
last experiment was conducted with a test sentence composed of 
voiced , non-nasal phonemes , wliich displayed no transitions in the 



spectrum rapid enough to warrant use of the 10 ms window mode , so 
the simulation operated as a conventional homomorphic vocoder with 
a 20 ms window. The result was judged to be ^ reasonably good, but 
relatively not quite as good as the two previous results . 

The tentative conclusion of the experimental phase of the 
investigation is that the adaptive strategy has potential for 
reducing vocoder data rates , while maintaining intelligibility , 
speaker recognition , and naturalness properties . 



Refe rences 

Hammett, J. C. , Jr. An Adaptive Spectrum Analysis Vocoder. 
Atlanta Ga. , Georgia Institute of Technology, 1971. 

142 p. CPh.D. Thesis.) 



1 . 



Computer Picture Processing 

M. D. Kelly, D. K. Smith 

The primary goal of This research has been the development: 
and improvement of techniques for computer picture processing. 

(The term "picture processing" is used to denote the processing 
of pictures obtained from the outside world, and includes areas 
often called "pictorial pattern recognition" and "picture analysis 
and description.") Our most recent work has focused on edge de- 
tection and the application of Fourier optics to digital pictures . 

Our work on edge detection has been an attempt to provide a 
unifying framework from which one can evaluate the widely varying 
edge detection algorithms which have been reported C 3 ,43 . This has 
led to the development of criteria by which the effectiveness of 
these algorithms can be measured. 

The measures developed for comparing edge detection operators 

were : 

* Computation time 

* Sensitivity to slope 

* Sensitivity to width 

* Ability to detect as distinct edges which are 
close together 

* Sensitivity to noise 

* Tendency to indicate unwanted edges in areas of 
gradually varying light intensity 

Programs have been developed for evaluating edge detection 
operators according to the above criteria . Operators which have 
been tested include: 

* Four point approximation to gradient 

* Nine point approximation to gradient 

* Five point approximation to Laplacian 

* None point approximation to Laplacian 

* The multiplicative operator of Rosenfeld C 3 1 . 

The programs described above were written in standard algo- 
rithmic languages. In addition, the PAX language £23 for computer 
picture processing has been extensively tested to determine its 
merits for efficient development and evaluation of picture process- 
ing operators. 

We have investigated applications of Fourier optics for 
computer picture processing. The spatial frequency do ma i n provides 
a useful measurement space for the development and comparison of 
techniques for image enhancement and bandwidth compression. 



